

Eric Dong (ericd3), Michael Kwan (mk26), Srikar Nalamalapu (svn3)

Advanced Features Implemented:

- Tomasulo Algorithm
- N-way Cache (Real LRU)
- L2 Cache
- Local Branch Prediction
- Software Model
- Prefetcher
- Multiple Fetch
- Superscalar (in progress)

#### Tomasulo

- ROB size: 15
- Instruction queue size: 15, 128 for superscalar
- Reservation sizes: 15
- Load store queue size: 15
- Branch prediction table size: 128
- Parameterized ROB, IQ, etc.
- Multiple commits
- Multiple fetching for "super scalar" (limited to 2 for simplicity)
- Circular queue used for ROB, load store queue, instruction queue

#### Software Model

- Runs parallel to Tomasulo CPU (autonomous, based on ROB commit signal)
- Raises error when there is mismatch in PC and registers
- Helps catch many edge cases with many modules interacting with each other

### **Branch Prediction**

- Local Branch
  Prediction
- Uses two bit counter
- Keeps track of maximum 64 branch instructions at once
- Average successful prediction of 75-85%

#### L2 Cache

- Sits betweenArbiter and PhysicalMemory
- Larger than L1Cache (4x)
- Sees significant performance improvement compared to w/o

# N-Way Cache

- Given Cache was
  1-Hit Cycle but no
  LRU
- Implemented parameterizable set associative cache
- Real LRU instead of pseudo-LRU (Easier to implement)

#### Prefetcher

- Prefetches future requests for L1D cache
- Place future cache lines into L2 Cache
- Ideally miss at L1, hit at L2

# Multiple Fetch

- Increased i cache size to 64 (2 instruction)
- Increased instruction queue to accommodate

# **Quantitative Results**

| Tomasulo (36.67 Mhz) | Time (ns) | Power (mW)     | Score   |
|----------------------|-----------|----------------|---------|
| Comp 1               | 1369485   | 1946.72*       | 5.00E-9 |
| Comp 2               | 4415415   | 1946.72*       | 1.68E-7 |
| Comp 3               | 2763605   | 1946.72*       | 4.11E-8 |
|                      |           | Geometric Mean | 3.25E-8 |

| Competition Code | Branch Prediction Accuracy |
|------------------|----------------------------|
| Comp 1           | 86.29%                     |
| Comp 2           | 78.35%                     |
| Comp 3           | 78.46%                     |

# Cache Improvements using a Parametrized Cache

#### Times using Different Cache Sizes



#### IPC with different Cache Sizes



# Superscalar (Multiple Fetches) with different Cache Sizes



## What We Wish Had Done Differently

- Make a better load/store queue which can operate out of order
- Make graphs for cache hit rate
- Start earlier
- Leave more time for debugging code
- Created a software model at the very beginning of our project

Questions?